17 research outputs found

    Low Cost, Cross-language and Cross-platform Information Retrieval and Documentation Tools

    Get PDF
    In this paper we focus on the design and implementation of low cost, cross language and cross platform Information Retrieval and Documentation tools capable for the collection, organization and administration of unstructured and semi-structured information imported from various sources. A modular Computer Assisted Information Resources Navigation (CAIRN) software architecture is proposed and the requirements of each module are presented. A discussion of the implementation is based on the experimentation with a prototype of such a software tool. The technologies that are incorporated into the modern operating systems and the opportunities that they offer for implementing the modules of the CAIRN architecture are also examined and evaluated. Some of these technologies are common / independent from the operating systems, while some others are distinctive. In this latter case we face barriers (restrictions) for a straightforward implementation of the CAIRN software systems to the whole range of desktop operating systems (e.g. Windows, Mac OS, Linux, Solaris). Some alternative technologies are presented to avoid this serious constraint. The evaluation of the implementation effort is also discussed and eventually some conclusions and future plans for further improvement of the CAIRN architecture are given

    Asia Minor Greek: Towards a Computational Processing

    Get PDF
    AbstractIn this paper, we discuss issues concerning the computational aspect of an on-going research project which aims at providing a systematic study of three Greek dialects of Asia Minor (“Pontus, Cappadocia, Aivali: In search of Asia Minor Greek”- AmiGre) In fact, the project constitutes the first attempt to describe dialectal phenomena at a phonological, morphological, and structural level. Furthermore, it also constitutes the first attempt in Greece to combine Informatics and Theoretical Lin- guistics in order to facilitate the above-mentioned task. The aim here is to provide the design principles of the computational component of the project namely, an electronic dictionary and a multimedia database which would provide an innovative mechanism of storing, processing and retrieving oral and written dialectal data

    Pronominal and Anaphor Resolution

    Get PDF
    Chomsky\u27s Binding ConditioJtS imply that each position of an antecedent of an anaphor cannot be a position for the antecedent of a pronominal. However, there have been examples in the literature where the above implication does not hold. A brief review of the linguistic background and the various definitions, relevant to this problem, given by other researchers is presented. Careful examination of these definitions leads to the conclusion that the problem can be resolved by considering two different definitions of the governing category, one for the governing category of a pronominal and one for the governing category of an anaphor. The definitions and rules selected are used to design computer algorithms. Using the Binding Conditions it is possible to find only the impossible antecedents in case of pronominals. Other algorithms which combine the information available from these conditions, in order to find the possible antecedents of any pronominal are also suggested

    Text Classification: Forming Candidate Key-Phrases from Existing Shorter Ones

    No full text
    Abstract: The hard problem of the Text Classification usually has various aspects and potential solutions. In this paper, two main research directions for narrative documents’ classification are considered. The first one is based on data mining and rule induction techniques, while the second combines the traditional Text Retrieval techniques (use of the vector space model, index terms, and similarity measures), Natural Language Processing and Instance based Learning techniques. Key-phrases can be used as attributes for mining rules or as a basis for measuring the similarity of new (unclassified) documents with existing (classified) ones. Hence, we eventually focus on the problem of extracting key-phrases from text’s collection in order to use them as attributes for text classification. A new algorithm for the discovery of key-phrases is described. Candidate key-phrases are built using frequent smaller ones and special emphasis is given to the reduction of the complexity of the algorithm
    corecore